Atomistic simulations reveal impacts of missense mutations on the structure and function of SynGAP1

Abstract De novo mutations in the synaptic GTPase activating protein (SynGAP) are associated with neurological disorders like intellectual disability, epilepsy, and autism. SynGAP is also implicated in Alzheimer’s disease and cancer. Although pathogenic variants are highly penetrant in neurodevelopmental conditions, a substantial number of them are caused by missense mutations that are difficult to diagnose. Hence, in silico mutagenesis was performed for probing the missense effects within the N-terminal region of SynGAP structure. Through extensive molecular dynamics simulations, encompassing three 150-ns replicates for 211 variants, the impact of missense mutations on the protein fold was assessed. The effect of the mutations on the folding stability was also quantitatively assessed using free energy calculations. The mutations were categorized as potentially pathogenic or benign based on their structural impacts. Finally, the study introduces wild-type-SynGAP in complex with RasGTPase at the inner membrane, while considering the potential effects of mutations on these key interactions. This study provides structural perspective to the clinical assessment of SynGAP missense variants and lays the foundation for future structure-based drug discovery.


SynGAP-Solvent Molecular Dynamics Simulations
SynGAP-solvent molecular dynamics (MD) simulations were performed using AMBER22 [11].The simulations were done for the WT protein (Figure 1A) and 211 variants (Table S1), each running for 3 x 150 ns.The system was prepared using TLEAP in AMBERTOOLS22 [12], the protein model was immersed in explicit water molecules (~32,500) within a 15 Å box radius.Na + counter ions were added for neutralization when necessary.The total system, containing ~138,400 atoms, utilized AMBERff19SB and OPC force fields for the protein and solvent, respectively [13,14].
Before the production MD simulations, a two-phase 2,000-step minimization, a 250-ps heating stage and a three-stage 250-ps equilibration were performed.The protein side chains and solvent were minimized before performing the unrestrained minimization on the entire system using the steepest descent method for four cycles followed by the full conjugate gradient minimization.The heating stage was performed in the canonical ensemble (NVT) where the system was heated to 310 K, whereas the equilibration stages were done using the isothermal-isobaric ensemble (NPT).
During heating and equilibration, the restraint weight for the protein was lowered successively from 4 kcal/mol-Å 2 to 1.A time step of 2 fs and non-bonding cut-off of 10 Å were applied.The temperature and pressure were raised and maintained at 310 K and 1 bar using Langevin dynamics and isotropic Berendsen barostat [15,16], respectively.Electrostatic interactions were calculated with the Particle Mesh Ewald (PME) method [17], and hydrogen bond (or H-bond) lengths were constrained with the SHAKE algorithm [18].The WT system was also simulated at 400 K to assess the model's thermal stability.The AMBER input files are given in the Supplementary Information (SI).
Before ensuing with the in silico mutagenesis, the WT system (Figure 1B) was pre-processed using the abovedescribed multi-step MD protocol.The lowest total energy frame of three 100-ns WT replica production MD simulations was selected as the 3D template for the mutagenesis.This pre-processing was done to ensure that the WT system was well-equilibrated before proceeding with the mutagenesis, final system preparation, and the SynGAPsolvent simulation steps, including the three replicate 150-ns production simulations (Figure S4).Note that the WT system was re-processed similarly to the variant systems with multistep scheme but without the in silico mutagenesis step.

SynGAP-Membrane and SynGAP-Ras-Membrane Complex Molecular Dynamics Simulations
The WT SynGAP protein was also modelled in complex with the RasGTPase and membrane (Figure 3A) or only with the membrane (Figure S2A) using CHARMM-GUI (https://www.charmm-gui.org/;accessed 03/08/2023) and CHARMM36 ff [19].A three-component membrane model, composed of dilinoleicphosphatidylcholine (N = 352), dilinoleicphosphatidylethanolamine (N = 480) and dilinoleicphosphatidylserine [20,21] (N = 192; Figure S3), was selected to represent the inner post-synaptic cell membrane leaflet.Ras was Cys-palmitoylated at residues 181 and 184 (Figures 3A and S3).The SynGAP-membrane complex (683,370 atoms; Figure S2A) was solvated in a rectangular box with water thickness of 60 Å using the TIP3P water molecules and a total of 193 Na + counter ions were added to neutralize the system charge.The SynGAP-RasGTPase-membrane complex (560,059 atoms; Figure 3A) was prepared with water thickness of 40 Å and neutralized with 201 Na + counter ions.For comparison, the membrane-only system (467,656 atoms, 192 Na + ) was also prepared.These membrane-related systems (Figures S2A and 3A) were MD simulated using GROMACS2021 [22].Firstly, the system was minimized with no constraints using a 5,000-step steepest-descent approach.Secondly, the system was equilibrated in two phases for 250 ps at 310 K with a 1 fs time step using NVT simulations.Thirdly, 2,000 ps and 2 fs time step NPT simulations were performed in four additional equilibration phases using semi-isotropic Berendsen thermostat and barostat.Finally, fourthly, a constraint-free production MD simulation with 2 fs time step was performed.The pressure was set to 1 bar and the temperature to 310 K utilizing the Parrinello-Rahman barostat [23] and Nosé-Hoover thermostat [24], respectively.The electrostatic interactions were calculated with the Particle Mesh Ewald (PME) method [17] using a 10 Å cut-off.The H-bond distances were constrained using the LINCS algorithm [25].
The WT SynGAP-Ras-membrane system preparation (Figure 3A) required extra steps due to inclusion of the Cyspalmitoylated lipids.A 50-ns production MD simulation was done after performing the multi-step equilibration described above.Then, the covalently attached lipids of Ras were pulled into the membrane using a steered MD simulation.The lipids (C16 atoms) were pulled along the Z-direction with a 500-ps umbrella pulling using a distance increase of 0.01 nm ps -1 and a force constant of 2000 KJ mol -1 nm -2 .Lastly, an 80-ns production simulation was performed with the final system where the lipids anchored RasGTPase to the membrane; thus, adding up the simulation time to 130 ns.In contrast, the SynGAP-membrane (Figure S2A) and membrane only systems were simulated after the equilibration steps for 300 ns and 150 ns, respectively.The MD simulation input files are given in the SI.

Assessing Protein Stability Changes Induced by Missense Mutations
The relative free energy of folding (ΔΔG) for WT and missense variants of SynGAP was estimated using the empirical force field algorithm FoldX5.0 [26].Firstly, water and ions were stripped away, and a representative snapshot frame was extracted as a PDB file for every nanosecond in the 150-ns WT SynGAP-solvent simulations using CPPTRAJ in AMBERTOOLS22 [12].Secondly, the RepairPDB option in FoldX was used to optimize the 150 extracted structures via six energy minimization cycles using to ensure potential energy convergence.Thirdly, the BuildModel option in FoldX was used to generate 211 missense variants using each of the optimized WT snapshot structure as a 3D template and, finally, the energy difference between the variant and WT systems was calculated.The modified MD protocol for FoldX has been introduced in prior studies [27][28][29][30][31][32].The Gibbs free energy difference or ΔΔG between WT and variant protein is calculated with the following formula (Eq.1): Eq. 1

∆∆𝐺 = ∆𝐺 − ∆𝐺 [𝑘𝑐𝑎𝑙 ]
where ΔGWT stands for the free energy of the WT system and ΔGvariant for the free energy of the missense variant system.The exact energy terms are detailed in the study introducing the force field [33,34].Notably, the individual ΔΔG values are only comparable between the WT and variant systems that have the same structural context used in the calculation.The estimate of ΔΔG for each variant was calculated by averaging across all 150 frames and, finally, the averages for the three replica trajectories were also averaged (Table S1).
As mutations close to zero free energy difference (ΔG ±2 kcal/mol) cannot be classified with confidence [30], the cutoff values were chosen to be as inclusive as possible while maximizing the accuracy of the predictions.The energy cut-offs that determine if the variant could impact folding were set to ≥2 kcal and ≤-2 kcal for the destabilizing and stabilizing mutations, respectively, adapted based on suggestions from prior studies [30,31,35].Variants falling within the energy barrier range from -0.5 to 0.5 were considered potentially benign based on the calculated standard deviation of 0.46 kcal/mol [26] when comparing the results against the clinical data.Whereas variants falling in-between the energy-based categories were designated ambiguous.
Additionally, the ΔΔG between the WT-SynGAP and the missense variants were calculated using the Rosetta Cartesian ΔΔG protocol devised by Frenz et al. [36][37][38][39][40]. Rosetta protocols allow backbone flexibility and can generate and explore different conformational ensembles from a single starting structure, unlike FoldX that suffers from backbone rigidity during sampling [31,41].The 0 ns WT-SynGAP snapshot structure from each replica simulation was first relaxed using the full-atom FastRelax protocol [42].50 repeats were applied and the lowest energy relaxed structure was chosen for the further ΔΔG calculations with the missense variants [31,43].ΔΔG values were reported as the lowest energy difference between the relaxed WT and the Rosetta-generated variant ΔG values calculated by ref2015 scoring function in the cartesian space from five prediction iterations for each individual run [44,45].The energy values were converted from Rosetta Energy Units (REUS) to Kcal/mol using the scaling factor 2.94 [41,46].
The energy calculations were run using the caretsian2020_ref2015.yamland converted using aggregate.yamlprovided by the customizable PYTHON wrapper RosettaDDGPrediction available under GNU General Public License v3.0, (https://github.com/ELELAB/RosettaDDGPrediction)[41].The same cut-offs were applied for the pathogenicity prediction as in the FoldX-MD protocol.The Ω Gly-rich loop was excluded from the calculations due to its excessive movement.Note that the replica simulations 1, 2 and 3 (black, red, and blue lines) may converge or end at slightly different RMSD levels, but based on systematic analysis this is not more frequent with either benign or pathogenic variants.between 50 to 150 ns (frame/ns), were aligned against the 50 ns frame using the protein backbone atoms in VMD 1.9.4a12 [47].The Gly-rich Ω loop (residues 365-395) was excluded from the alignment due to its extreme volatility in both cases.(C) The RMSD calculations, which were done using CPPTRAJ in AMBERTOOLS22 [12] for the entire C2 domain, including also the Gly-rich Ω loop, indicate that the membrane association increases the C2 domain movement (orange line) and makes the membrane facing loop (Figure S2A) movement more dynamic in comparison to the solvent only simulation (blue line).In the solvent, the C2 domain loops can interact (e.g., H-bond, sterically stack) with each other stably, but with the membrane the loops are in constant interaction with the surface phospholipid head groups.(D) R324W (VUS), (E) S385P (VUS), and (F) W572S (Pathogenic).The plots are grouped into "Proline", "Pre-Proline" (one residue before Proline), "Glycine", and all other "General" amino acids to address different angle distributions of the amino acids.Note that the analysis was performed to all simulated SynGAP missense variants for 0 ns, 50 ns, 100 ns and 150 ns time steps, but the plots are shown here only for these six systems for brevity.and AUC (Area Under Curve) values, that were calculated using ROCKER0.1.4[50], show that Rosetta and FoldX-MD have comparable accuracy, albeit without statistically significant difference (Table S2).(D) The Venn diagram, that was drawn using DeepVenn (https://www.deepvenn.com/;accessed 10/07/2024)[51], indicates that predictions from FoldX-MD matched with ~73% of the Rosetta ones; excluding the ambiguous submissions.See Figure 5 for further details.
Table S2.The key hydrogen bonding residues of SynGAP in complex formation

Figure S2 .
Figure S2.SynGAP-membrane complex.(A) SynGAP model (cartoon model) interacting via its C2 domain (magenta) with the post-synaptic inner leaflet membrane model (stick models) in the molecular dynamics (MD) simulation at 300 ns.The highly flexible Gly-rich Ω loop (blue) finds conformation that stabilizes the C2 domainmembrane association.(B) The C2 domains are arranged from two β sheets that are composed of eight β antiparallel strands, forming the β sandwich.This arrangement differs between Type I (e.g., RIM2 in PDB: 2BWQ;[4]) and Type II (e.g., PTEN in PDB: 1D5R[5]) C2 domains.The SynGAP C2 domain belongs to the Type II category.Matching βstrands between the C2 domains are painted with the same colours.(C) An established membrane orientation of the C2 domain from a plant GTPase activating protein 1 (OsGAP1; PDB: 4RJ9[8,9]) was used as a basis for the positioning of the C2 domain of SynGAP (blue disc).

Figure S4 .
Figure S4.In silico mutagenesis and molecular dynamics simulation workflow for SynGAP missense variants in the solvent.Firstly, the solvated AlphaFold model, truncated to contain only the ≥50% reliable parts (res.198-730), was energy minimised, heated and equilibrated under constraint simulations before performing the free 100-ns production MD simulations.Secondly, the lowest energy simulation frame was selected for performing the in silico mutagenesis and then repeating all of the simulation steps of the initial model preparation.Note that the WT system was prepared otherwise similarly as the missense variants but only the mutagenesis step was omitted.

Figure S5 .
Figure S5.Root mean square deviation of protein backbone with a selected assortment of variants.The RMSD (Root Mean Square Deviation) was calculated for the entire 150 ns MD simulations using the first frame as a reference using CPPTRAJ in in AMBERTOOLS22[12].The RMSD values are plotted for the WT system and the missense variants, S235P (Likely Pathogenic), R324W (VUS), V400L (Benign), R475Q (Not in ClinVar) and I494R (Likely Pathogenic).The Ω Gly-rich loop was excluded from the calculations due to its excessive movement.Note that the

Figure S6 .
Figure S6.The effect of membrane on the SynGAP C2 domain stability and top loop movement.Two overlays of 100 MD simulation frames of the C2 domain structure are shown from (A) SynGAP solvent only simulation (cyan trace model) and (B) SynGAP-membrane complex (orange trace model) simulation.The frames, that were extracted

Figure S7 .
Figure S7.Normalized B-factors of protein backbone for a selected assortment of systems.The missense mutations prompt local changes in the SynGAP structure during the MD simulations, that are also reflected as increased fluctuation of the protein backbone.The normalized B-factors[48] were calculated for all frames included in the 150-ns production simulations.The normalized B-factors are shown with red lines for the following missense variants: V452F (VUS), L465P (Likely Pathogenic), M468T (VUS), R475Q (Not in ClinVar), I494R (Likely Pathogenic), and G502D (VUS).For comparison, the maximum normalized B-factor values of the three WT replica simulations were also plotted with blue lines.The extra peaks, caused by the mutations and not found in any of the replica WT simulations, are highlighted with black arrows focusing the attention on the local missense mutationinduced changes.For brevity and clarity, the results are only shown for one replica simulation where the effect of the mutation was apparent in the B-factors.

Figure S8 .
Figure S8.Lipid bilayer metrics for the membrane only and SynGAP-membrane systems in simulations.(A) The density curves for the phosphate atoms (orange line) indicate that the lipid bilayer thickness for the membrane only system was ~3.3 nm.(B) When the SynGAP protein was included to the membrane system, the bilayer thickness increased to ~3.5 nm based on the density curves (orange line).The thickness was calculated as the distance between the phosphate groups of the opposite leaflets using GMX DENSITY in GROMACS 2021 for the entire 150-ns and 300-ns simulation trajectory for A and B, respectively.(C) The membrane only (orange line) and SynGAP-membrane (blue line) systems generated similar area per lipid (APL) curves during the 150 ns simulations.The APL average value for the last 100 ns (50-150 ns) of the simulation trajectories was 0.57 for both the membrane only and SynGAPmembrane systems.The APL values were calculated based on the periodic boundary conditions (PBC) box X and Y coordinate changes using GMX ENERGY in GROMACS 2021.

Figure S10 .
Figure S10.2D protein secondary structure topology diagrams with B-factor colouring.The topologies, showing α helices (helical spring), β strands (arrows) and coils/loops/turns (lines), are shown for the last MD simulation frame or 150 ns snapshot using the 2D secondary structure diagrams.The colouring is done by normalized B-factor calculated over 150-ns MD simulation per residue[48].The topologies were generated using SSDraw (GitHub: https://github.com/ethanchen1301/SSDraw)[49].The diagrams are coloured with YlOrRd colormap in MATPLOTLIB using the side chain movement normalized B-factors calculated over the entire 150-ns simulations.In general, the substantial movement in the loops is indicated by the red colouring in all systems.The missense mutations did not generate consistent changes in the plotted secondary structure elements nor in the B-factor colouring over the WT system.However, the local changes induced by the mutations are indicated by the downward red arrows for the following systems: R248P (Likely Pathogenic), A469D (VUS), and I510S (Likely Pathogenic).Note that the analysis was performed to all 211 simulated SynGAP missense variants for 0 ns, 50 ns, 100 ns and 150 ns time steps, but the plots are shown only for these four systems for brevity.

Figure S11 .
Figure S11.The N-terminal PH domain section rotates in 400 K molecular dynamics simulations.(A) The PH domain rotated from the initial WT position (B) to an alternative position in the 150-ns MD simulation performed at temperature of 400 K.The PH domain and the rest of the SynGAP model are shown as pink and cyan cartoons, respectively.The C-and N-termini of the model are highlighted with white spheres.

Figure S12 .
Figure S12.Selected examples of SynGAP missense mutations.(A) The location of the missense variants L431P, R259W, I494R, C547R, C282R, A271D, and V400L is shown (orange spheres) on the SynGAP model structure (cyan cartoon).(B) In the L431P variant, Pro431 cannot retain the same backbone H-bond with His431 (grey) on the α helix (res.427-438) as the leucine of the WT protein.(C) In the R259W variant, Trp259 cannot form the salt bridges with Asp261 of the C2 domain, or Asp684 of the GAP domain similarly as WT Arg259 is able to.(D) In the WT protein, Ile494 on the α helix (res.489-519) packs hydrophobically with Met468 and Leu465 on the opposing α helix (res.460-477), while the large positive Arg494 side chain in the I494R variant disrupts the packing and weakens the helix integrity.(E) In the WT protein, Cys547 on the α helix (res.533-560)hydrophobically packs with Phe652 and Met649on the opposing α helix (res.641-666), while the large positive Arg547 disrupts this packing at the inter-helix space.(F)In the C282R variant, a large positively charged Arg282 is introduced into the hydrophobic core of the C2 domains, despite the implausibility of this scenario, only intra-protein distortion without large-scale unfolding or expulsion of the arginine from the core follows.(G) In the A271D variant, the carboxylate group of Asp271 on the β strand (res.259-272),unlike the methyl side chain of Ala271 in the WT protein, H-bonds with the nearby loop residues, thereby, interfering with the loop dynamics.H) Val400 and Leu400 on the β strand (res.398-411) both in the WT and in the V400L variant systems, respectively, favourably pack inside a hydrophobic niche.50 ns snapshots from MD simulations were used for the zoom-in views.The mutated residues are shown with orange backbone.

Figure S13 .
Figure S13.Rosetta binding stability calculations of SynGAP missense mutations.(A) The confusion matrix compares the neutral and destabilizing predictions of the Rosetta against the benign (or likely benign) and pathogenic (or likely pathogenic) variants from ClinVar.True positives and negatives are shown in blue, while false positives and negatives are in red.(B) The number of neutral predictions exceeds the known benign variants, yet more variants are classified as destabilizing than those identified as pathogenic.(C) ROC (Receiver Operating Characteristic) curves